By Katherine Tansey
Data about financial contributions to the 2016 US Presidential Campaigns for the state of New Jersey was download on the 1st of September 2015 from http://fec.gov/disclosurep/PDownload.do.
There are mulitple candiates for each party (Republican and Democrat) still in the runing for the party nomination, so there are more than two candidates at the moment.
For the past 6 elections (since 1992), New Jersey has voted Democrat, and it will be interesting to see how this may effect current political leanings.
Load the data and relevant libraries. Examine the data for summary information.
## cmte_id cand_id cand_nm contbr_nm
## 1 C00575795 P00003392 Clinton, Hillary Rodham STRINGER, KRISTINE
## 2 C00575795 P00003392 Clinton, Hillary Rodham CROTTY, SHEILA
## 3 C00575795 P00003392 Clinton, Hillary Rodham MITZMAN, THEA
## 4 C00575795 P00003392 Clinton, Hillary Rodham YURT, NURAY
## 5 C00575795 P00003392 Clinton, Hillary Rodham NICOLO, MARIA
## 6 C00575795 P00003392 Clinton, Hillary Rodham TALLAJ, RAMON
## contbr_city contbr_st contbr_zip contbr_employer contbr_occupation
## 1 SOUTH ORANGE NJ 70792116 SELF-EMPLOYED ATTORNEY
## 2 CLIFTON NJ 70121939 N/A NOT EMPLOYED
## 3 CALDWELL NJ 70071406 N/A HOMEMAKER
## 4 PISCATAWAY NJ 88544546 NOVARTIS DIRECTOR
## 5 TITUSVILLE NJ 85601724 SELF-EMPLOYED INFORMATION REQUESTED
## 6 PARAMUS NJ 76525505 SELF-EMPLOYED PHYSICIAN
## contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text
## 1 250 12-Apr-15
## 2 100 27-Apr-15
## 3 2700 29-May-15
## 4 2700 27-Apr-15
## 5 2700 29-Jun-15
## 6 2700 30-Apr-15
## form_tp file_num tran_id election_tp
## 1 SA17A 1015585 C19928 P2016
## 2 SA17A 1015585 C87019 P2016
## 3 SA17A 1015585 C176829 P2016
## 4 SA17A 1015585 C77059 P2016
## 5 SA17A 1015585 C292569 P2016
## 6 SA17A 1015585 C88649 P2016
## [1] "cmte_id" "cand_id" "cand_nm"
## [4] "contbr_nm" "contbr_city" "contbr_st"
## [7] "contbr_zip" "contbr_employer" "contbr_occupation"
## [10] "contb_receipt_amt" "contb_receipt_dt" "receipt_desc"
## [13] "memo_cd" "memo_text" "form_tp"
## [16] "file_num" "tran_id" "election_tp"
## [1] 2435 18
## 'data.frame': 2435 obs. of 18 variables:
## $ cmte_id : Factor w/ 14 levels "C00458844","C00500587",..: 6 6 6 6 6 6 6 6 6 10 ...
## $ cand_id : Factor w/ 14 levels "P00003392","P20002721",..: 1 1 1 1 1 1 1 1 1 10 ...
## $ cand_nm : Factor w/ 15 levels "Bush, Jeb","Carson, Benjamin S.",..: 3 3 3 3 3 3 3 3 3 10 ...
## $ contbr_nm : Factor w/ 1353 levels "ABDELAZIZ, AL",..: 1208 230 842 1344 895 1223 755 74 894 260 ...
## $ contbr_city : Factor w/ 346 levels "ALLENDALE","ALLENHURST",..: 289 57 43 247 302 234 100 194 174 131 ...
## $ contbr_st : Factor w/ 1 level "NJ": 1 1 1 1 1 1 1 1 1 1 ...
## $ contbr_zip : int 70792116 70121939 70071406 88544546 85601724 76525505 70245022 70423025 77462751 7205 ...
## $ contbr_employer : Factor w/ 692 levels "","A&E STORES",..: 562 415 415 450 562 562 415 122 592 562 ...
## $ contbr_occupation: Factor w/ 445 levels "","ACADEMIC",..: 23 281 206 118 214 314 206 241 339 314 ...
## $ contb_receipt_amt: num 250 100 2700 2700 2700 2700 2700 2700 50 2700 ...
## $ contb_receipt_dt : Factor w/ 117 levels "01-Apr-15","01-Jun-15",..: 40 98 109 98 107 110 28 95 28 107 ...
## $ receipt_desc : Factor w/ 10 levels "","REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC)",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_cd : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_text : Factor w/ 21 levels "","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ form_tp : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ file_num : int 1015585 1015585 1015585 1015585 1015585 1015585 1015585 1015585 1015585 1015617 ...
## $ tran_id : Factor w/ 2435 levels "A032126DA61AA40D699B",..: 491 1215 436 1167 900 1220 513 836 516 2108 ...
## $ election_tp : Factor w/ 2 levels "G2016","P2016": 2 2 2 2 2 2 1 2 2 2 ...
## cmte_id cand_id cand_nm
## C00575795:1083 P00003392:1083 Clinton, Hillary Rodham :1083
## C00577130: 276 P60007168: 276 Sanders, Bernard : 276
## C00574624: 259 P60006111: 259 Carson, Benjamin S. : 230
## C00573519: 230 P60005915: 230 Cruz, Rafael Edward 'Ted': 230
## C00458844: 208 P60006723: 208 Rubio, Marco : 208
## C00575449: 170 P40003576: 170 Paul, Rand : 170
## (Other) : 209 (Other) : 209 (Other) : 238
## contbr_nm contbr_city contbr_st contbr_zip
## SACKS-WILNER, TOM : 19 PRINCETON : 66 NJ:2435 Min. : 7008
## LORENZO, CAREY : 15 HOBOKEN : 54 1st Qu.:70783015
## SPAIR SR, RICHAERD: 15 MONTCLAIR : 49 Median :77242352
## EDWARDS, DIANE : 14 WEST ORANGE: 48 Mean :74451818
## HESS, CHARLES W. : 14 MORRISTOWN : 47 3rd Qu.:80572352
## STORCH, EVELYN : 14 CHERRY HILL: 42 Max. :89042725
## (Other) :2344 (Other) :2129
## contbr_employer
## RETIRED : 338
## SELF-EMPLOYED : 214
## N/A : 207
## NOT EMPLOYED : 96
## INFORMATION REQUESTED PER BEST EFFORTS: 89
## (Other) :1489
## NA's : 2
## contbr_occupation contb_receipt_amt
## RETIRED : 444 Min. :-5000.0
## ATTORNEY : 150 1st Qu.: 50.0
## NOT EMPLOYED : 108 Median : 143.5
## INFORMATION REQUESTED PER BEST EFFORTS: 80 Mean : 669.4
## HOMEMAKER : 72 3rd Qu.: 1000.0
## (Other) :1580 Max. : 5400.0
## NA's : 1
## contb_receipt_dt
## 30-Jun-15: 189
## 29-Jun-15: 93
## 12-Apr-15: 82
## 23-Jun-15: 72
## 26-Jun-15: 68
## 12-Jun-15: 62
## (Other) :1869
## receipt_desc memo_cd
## :2402 :2378
## Refund : 9 X: 57
## REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC): 5
## REATTRIBUTION FROM SPOUSE : 3
## REATTRIBUTION TO SPOUSE : 3
## REDESIGNATION FROM PRIMARY : 3
## (Other) : 10
## memo_text form_tp
## :2108 SA17A:2394
## * EARMARKED CONTRIBUTION: SEE BELOW : 255 SA18 : 32
## EARMARKED FROM MAKE DC LISTEN : 35 SB28A: 9
## REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC): 5
## REATTRIBUTION FROM SPOUSE : 3
## REATTRIBUTION TO SPOUSE : 3
## (Other) : 26
## file_num tran_id election_tp
## Min. :1003942 A032126DA61AA40D699B: 1 G2016: 31
## 1st Qu.:1015509 A03612478EFDA491AB11: 1 P2016:2404
## Median :1015585 A04059564B8CB422CA72: 1
## Mean :1015272 A06CBD04D2CBB4D29B7B: 1
## 3rd Qu.:1015585 A06F4FE70F5794854B7D: 1
## Max. :1015715 A0B430521A50B4B038B3: 1
## (Other) :2429
What do the variables in the data mean ?
CMTE_ID = COMMITTEE ID
CAND_ID = CANDIDATE ID
CAND_NM = CANDIDATE NAME
CONTBR_NM = CONTRIBUTOR NAME
CONTBR_CITY = CONTRIBUTOR CITY CONTBR_ST = CONTRIBUTOR STATE CONTBR_ZIP = CONTRIBUTOR ZIP CODE CONTBR_EMPLOYER = CONTRIBUTOR EMPLOYER CONTBR_OCCUPATION = CONTRIBUTOR OCCUPATION CONTB_RECEIPT_AMT = CONTRIBUTION RECEIPT AMOUNT CONTB_RECEIPT_DT = CONTRIBUTION RECEIPT DAT RECEIPT_DESC = RECEIPT DESCRIPTION
MEMO_CD = MEMO CODE MEMO_TEXT = MEMO TEXT FORM_TP = FORM TYPE FILE_NUM = FILE NUMBER TRAN_ID = TRANSACTION ID ELECTION_TP = ELECTION TYPE/PRIMARY GENERAL INDICATOR
I’m going to start by just getting to know the data, as I’ve never worked with it before. I find this easiest by plotting the variables and getting some summary informations for them.
Examine the variable cmte_id
Most people are donating to one committee predominantly.
Most people are donating to one candidate predominantly.
Most people are donating to Hilary Clinton (Democrat), which I think is the same information capture in the previous two plots.
From the plot, I did noticed an issue with one of the candidates names. Ted Cruz is listed twice (once in all upper case). This will be problematic if not corrected as we would incorrectly make conclusions about the data. This can be easily fixed.
##
## Bush, Jeb Carson, Benjamin S.
## 114 230
## Clinton, Hillary Rodham Cruz, Rafael Edward 'Ted'
## 1083 259
## CRUZ, RAFAEL EDWARD TED Fiorina, Carly
## 0 19
## Graham, Lindsey O. Huckabee, Mike
## 25 7
## O'Malley, Martin Joseph Pataki, George E.
## 16 17
## Paul, Rand Perry, James R. (Rick)
## 170 2
## Rubio, Marco Sanders, Bernard
## 208 276
## Santorum, Richard J.
## 9
Looks like it is fixed. Lets remake the plot.
There are many different contributors but there are some that contribute more than once. But this plot is too full to make much sense of the data. I’ll count the number of times (frequency) each individual contributor name occurs (new variable called “count_NM”).
## [1] 1353 2
How many unique individual contributors there are?
## [1] 929 2
How many have indivdiuals have made over 10 contributions?
## [1] 11 2
Who is the most frequency contributors.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.0 1.0 1.8 2.0 19.0
## Var1 Freq
## 1062 SACKS-WILNER, TOM 19
The new variable is a count of the number of times each individual contributor name occurs, and this can be plotted to looked at the distribution.
From the histogram for the number of times an indidivual donated, we can see that most people donate only once, and few donate more than 5 times. There are 1353 unique donators in the file of 2435 donations. Of those 929 have donated only once, with the maximum number of donations by a single person being 19 (listed as SACKS-WILNER, TOM). Only 11 individuals have donated more than 10 times.
The plot is too full to make much sense of it, but it is clear that some cities have more people making campaign donations than others. I’ll again count the number of occurances of a city (new variable called “count_CITY”).
## [1] 346 2
How many unique cities are listed?
## [1] 68 2
How many cities have made over 10 contributions?
## [1] 60 2
Which city is the most frequently listed (city with most contributors)?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 4.000 7.038 8.000 66.000
## Var1 Freq
## 255 PRINCETON 66
This new variables can also be plotted to look at the distribution.
Many cities only have few people donating to campaigns, but there are very active cities, with the maximum number of people donating from a single city being 66 (city is listed as Princeton). There are 346 unique cities in the file of 2435 donations. Of those 68 are listed only once. 60 cities are listed more than 10 times.
Does the zip code variable give any additional information? I’ll again count the number of observations of a zip code (new variable called “count_ZIP”).
## [1] 1219 2
How many unique zip codes are listed?
## [1] 709 2
How many zip codes are listed in the dataset more than 10 times?
## [1] 13 2
Which zip codes is the most frequently listed zip codes (zip code with most contributors)?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 1.998 2.000 19.000
## Var1 Freq
## 922 80559348 19
This new variables can also be plotted to look at the distribution.
It does. The tail does not go to as high a number as the city variable (max was 66, zip max is 19), suggesting that zip while obviously highly related to city does give slightly different information, with there being more zip codes in the dataset (unique zip codes = 1219) than cities (unique cities = 346). The zip code with the largrest number of donations is 08055-9348, which is for Medford, and that is different from the city with the most donations which was Princeton. Perhaps zip code offers a greater resolution to location of an individuals by giving their lcoation within a city as well as city.
I will assess employer information in the same way, creating a new variable called “count_EMPLOYER” which counts the frequency of each individual employer listed.
## [1] 692 2
Count the number of unique employers.
## [1] 446 2
How many employers are listed more than 10 times?
## [1] 18 2
List the most frequently listed employer from the contributors.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 3.516 2.000 338.000
## Var1 Freq
## 529 RETIRED 338
There are 692 unique employers listed, most of which are single related to individuals who only gave a single donation (446). The maximum number of times the same employer is list is 338, which Retired. This variable may not be as relevant as CONTBR_OCCUPATION.
Occupation will be assessed the same as the others, creating a new variable called “count_OCCUPATION” which counts the frequency of each individual occupation listed.
## [1] 445 2
Count the number of unique occupations.
## [1] 224 2
How many occupations are listed more than 10 times?
## [1] 30 2
List the most frequently named occupation for the contributors, and the top 10 most frequent occupations listed.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 1.00 5.47 3.00 444.00
## Var1 Freq
## 359 RETIRED 444
## Var1 Freq
## 359 RETIRED 444
## 23 ATTORNEY 150
## 281 NOT EMPLOYED 108
## 215 INFORMATION REQUESTED PER BEST EFFORTS 80
## 206 HOMEMAKER 72
## 96 CONSULTANT 64
## 214 INFORMATION REQUESTED 62
## 314 PHYSICIAN 52
## 241 LAWYER 41
## 49 CEO 38
Interestingly this variable list more donations coming from Retired individuals than employer. There are less unique occupations, but still a large number (445). It would be nice to see this broken down into even broader categories. Might think about how best to handle this information. Could possible investigate just the occupations with the greatest number of contributors (like say the top 10).
The histogram tells us that most people are giving small amounts, with some larger donations. There is also a peak just under $3000. This is likely $2,700 which is the limit an individual may give to an individual candidate, and thus the peak is signifying the maximum contribution. Two things are very interesting: (1) There are donations above the limit of $2700. (2) There are some negative amounts in the contributions.
## [1] 22 18
## [1] 7 18
## Var1 Freq
## 1 5
## 2 REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC) 0
## 3 REATTRIBUTION FROM SPOUSE 0
## 4 REATTRIBUTION TO SPOUSE 3
## 5 REDESIGNATION FROM PRIMARY 0
## 6 REDESIGNATION FROM SENATE GENERAL 0
## 7 REDESIGNATION TO GENERAL 3
## 8 REDESIGNATION TO PRESIDENTIAL GENERAL 2
## 9 Refund 9
## 10 SEE REATTRIBUTION 0
## Var1 Freq
## 1 4
## 2 REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC) 1
## 3 REATTRIBUTION FROM SPOUSE 0
## 4 REATTRIBUTION TO SPOUSE 0
## 5 REDESIGNATION FROM PRIMARY 0
## 6 REDESIGNATION FROM SENATE GENERAL 0
## 7 REDESIGNATION TO GENERAL 0
## 8 REDESIGNATION TO PRESIDENTIAL GENERAL 0
## 9 Refund 0
## 10 SEE REATTRIBUTION 2
There are 22 donations below 0 and 7 above the federal set maximum limit of $2,700. Almost all the donations below 0 are refunds. The ones above $2,700 list reattribution, which means putting the donation potentially in some one else’s name, but for the majority the receipt description is blank.
Would think to look at the data in the range of 0 to 2700, which is the allowable range for donations.
What does this look like transformed? Does the second peak go away.
Doesn’t look that much better than the original - this is mostly likely due to the ceiling effect producing an odd peak no matter the transformation. I also tried log2 and square root, but they didn’t look any better.
This is a busy bar graph, but can clearly see that there are certain dates that people donate on more than others, with one in particularly standing out by creating a new variable called “count_DATE” which counts the frequency of each individual date listed.
## [1] 117 2
Count the number of unique dates.
## [1] 15 2
How many dates are listed more than 10 times in the dataset?
## [1] 67 2
List the most frequently listed dates in the dataset
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 5.00 14.00 20.81 30.00 189.00
## Var1 Freq
## 112 30-Jun-15 189
And that date is the 30th of June 2015 - with 189 donations made on that day. A quick google search tells me that on that day the governor of NJ (Chris Christie) declared his candidacy for the US presidential election, however Chris Christie was not one of the candidates named in the data for having donations - so maybe the announcement spurned people on to contribtue to rival campaigns. Will be interesting to see how the candidates campaigns received donations over time.
Most of these are relatively small in value.
Like the reciept description - most of these are relatively small in value.
SA17A is for individual contributions, which most of this data is. SA18 is transfers from other authorized committees, and SB28A are refunds to individuals.
There are two types of elections. Most are for P2016, which is Primary 2016, but there are a few that are designated for G2016 which is the General 2016 election, which is slightly confusing that people could contribute to that since the primaries are not yet completed and candidates have not been choosen for that election.
Since there are multiple candidates at the current moment, it would be good to know which candidate belongs to which party, so I’m going to make a new variable called cand_party. I’m going to table the frequency of each candidate name in the dataset.
## Var1 Freq
## 1 Bush, Jeb 114
## 2 Carson, Benjamin S. 230
## 3 Clinton, Hillary Rodham 1083
## 4 Cruz, Rafael Edward 'Ted' 259
## 5 CRUZ, RAFAEL EDWARD TED 0
## 6 Fiorina, Carly 19
## 7 Graham, Lindsey O. 25
## 8 Huckabee, Mike 7
## 9 O'Malley, Martin Joseph 16
## 10 Pataki, George E. 17
## 11 Paul, Rand 170
## 12 Perry, James R. (Rick) 2
## 13 Rubio, Marco 208
## 14 Sanders, Bernard 276
## 15 Santorum, Richard J. 9
There are fourteen candidates (listed below) and their party affiliation (found via google): Bush, Jeb Republican Carson, Benjamin S. Republican Clinton, Hillary Rodham Democrat Cruz, Rafael Edward ‘Ted’ Republican Fiorina, Carly Republican Graham, Lindsey O. Republican Huckabee, Mike Republican O’Malley, Martin Joseph Democrat Pataki, George E. Republican Paul, Rand Republican Perry, James R. (Rick) Republican Rubio, Marco Republican Sanders, Bernard Democrat Santorum, Richard J. Republican
There are more Republicans than Democrats.
I’m going to make a new variable that distinguishes which party each candidate belongs to and call this “cand_party”. Then I’m going to table this new variable to see how often they occur in the dataset.
##
## Democrat Republican
## 1375 1060
Despite there being more Republican candidates, it appears that donations have occured for Democrats. I can plot this new variable.
I’ve now looked at all the vairables individually from the dataset. This was a good way to get to know the data. Most people in NJ are making contributions to Democrats, with the most contributions going to Hilary Clinton. Most donations are small, but there is a peak at the ceiling of $2700 (the maximum allowed), however I did notice some amount above that and also negative numbers which seemed to reflect mostly refunds. Occupation data while interesting, was sparse, as not everyone had given this information. Furthermore, it was not broken into broad enough categories for it to be fruitful going forward (over 400 categories), however, a lot of them appear only once, and restricting to the top 10 occupations listed may still be of interest.
I’m mostly interested in seeing which candidates get the largest amount of money, if these is a difference in amounts by party affiliation, if there is difference in amount and candidaate by occupation or location.
Examine if the donation amounts differ by party affiliation and candidate. First look at party affiliation
Let’s look at this without the donations we think are outliers.
Democrats appear to get more donations than Republicans, but we can quantify this with the data. Let’s calculate the mean (and standard deviation) of donation amount by party affliation and the total donation amount rasied by each party.
## cand_party mean sd sum
## 1 Democrat 755.1083 1057.661 1038273.9
## 2 Republican 558.1716 1021.008 591661.9
Democrats have a higher mean for donation amount and a higher total donation amount. So they are raising more money than the Republican party in New Jersey.
We can plot the donation amount for each individual candidate by pary.
There is quite a spread among the Republican nominations with some receiving small donation amounts on average with some larger outliers (i.e. Carson, Cruz, Rand, Rubio), but there are also candidatest that just recieved large donations (but appears to be few) (i.e. Bush, Pataki, Perry), and there there are the candidates receiving a spread of donations (i.e. Graham, Huckabee, Santorum). This pattern aligns more with the Democrat candidates (i.e. Clinton, O’Malley) who also appear to have a large spread of donation amounts.
I will create a new dataset (“NJ_money_by_candidate”) which will contain the mean, standard deviation and total sum of donation amounts by each individual candidate. This information will be plotted to show the total donation amount raised by each individual candidate.
## cand_nm mean sd sum
## 1 Bush, Jeb 2482.4561 635.0382 283000.00
## 2 Carson, Benjamin S. 182.8478 477.6293 42055.00
## 3 Clinton, Hillary Rodham 881.3203 1136.8932 954469.90
## 4 Cruz, Rafael Edward 'Ted' 182.1313 559.3187 47172.00
## 5 Fiorina, Carly 692.9474 893.0102 13166.00
## 6 Graham, Lindsey O. 1548.0000 1038.2758 38700.00
## 7 Huckabee, Mike 1257.1429 1034.3068 8800.00
## 8 O'Malley, Martin Joseph 1500.0000 979.1152 24000.00
## 9 Pataki, George E. 2547.0588 434.6229 43300.00
## 10 Paul, Rand 285.6818 488.7603 48565.90
## 11 Perry, James R. (Rick) 2700.0000 0.0000 5400.00
## 12 Rubio, Marco 226.2163 856.2588 47053.00
## 13 Sanders, Bernard 216.6811 255.7233 59803.97
## 14 Santorum, Richard J. 1605.5556 2301.6902 14450.00
Clinton (Dem) has raised the most money by far, with Bush (Rep) in second.
I have time data for donations, so the next question might be to look at patterns of donations over time, but first I need to ensure that the date information is being correctly recognized as a date.
## 'data.frame': 2435 obs. of 19 variables:
## $ cmte_id : Factor w/ 14 levels "C00458844","C00500587",..: 6 6 6 6 6 6 6 6 6 10 ...
## $ cand_id : Factor w/ 14 levels "P00003392","P20002721",..: 1 1 1 1 1 1 1 1 1 10 ...
## $ cand_nm : Factor w/ 15 levels "Bush, Jeb","Carson, Benjamin S.",..: 3 3 3 3 3 3 3 3 3 10 ...
## $ contbr_nm : Factor w/ 1353 levels "ABDELAZIZ, AL",..: 1208 230 842 1344 895 1223 755 74 894 260 ...
## $ contbr_city : Factor w/ 346 levels "ALLENDALE","ALLENHURST",..: 289 57 43 247 302 234 100 194 174 131 ...
## $ contbr_st : Factor w/ 1 level "NJ": 1 1 1 1 1 1 1 1 1 1 ...
## $ contbr_zip : int 70792116 70121939 70071406 88544546 85601724 76525505 70245022 70423025 77462751 7205 ...
## $ contbr_employer : Factor w/ 692 levels "","A&E STORES",..: 562 415 415 450 562 562 415 122 592 562 ...
## $ contbr_occupation: Factor w/ 445 levels "","ACADEMIC",..: 23 281 206 118 214 314 206 241 339 314 ...
## $ contb_receipt_amt: num 250 100 2700 2700 2700 2700 2700 2700 50 2700 ...
## $ contb_receipt_dt : Factor w/ 117 levels "01-Apr-15","01-Jun-15",..: 40 98 109 98 107 110 28 95 28 107 ...
## $ receipt_desc : Factor w/ 10 levels "","REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC)",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_cd : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_text : Factor w/ 21 levels "","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ form_tp : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ file_num : int 1015585 1015585 1015585 1015585 1015585 1015585 1015585 1015585 1015585 1015617 ...
## $ tran_id : Factor w/ 2435 levels "A032126DA61AA40D699B",..: 491 1215 436 1167 900 1220 513 836 516 2108 ...
## $ election_tp : Factor w/ 2 levels "G2016","P2016": 2 2 2 2 2 2 1 2 2 2 ...
## $ cand_party : chr "Democrat" "Democrat" "Democrat" "Democrat" ...
The ‘contb_receipt_dt’ variable is not as a date but instead a factor. This needs to be changed. I will create a new variable (“Date”) that contains the date in the correct format.
## 'data.frame': 2435 obs. of 20 variables:
## $ cmte_id : Factor w/ 14 levels "C00458844","C00500587",..: 6 6 6 6 6 6 6 6 6 10 ...
## $ cand_id : Factor w/ 14 levels "P00003392","P20002721",..: 1 1 1 1 1 1 1 1 1 10 ...
## $ cand_nm : Factor w/ 15 levels "Bush, Jeb","Carson, Benjamin S.",..: 3 3 3 3 3 3 3 3 3 10 ...
## $ contbr_nm : Factor w/ 1353 levels "ABDELAZIZ, AL",..: 1208 230 842 1344 895 1223 755 74 894 260 ...
## $ contbr_city : Factor w/ 346 levels "ALLENDALE","ALLENHURST",..: 289 57 43 247 302 234 100 194 174 131 ...
## $ contbr_st : Factor w/ 1 level "NJ": 1 1 1 1 1 1 1 1 1 1 ...
## $ contbr_zip : int 70792116 70121939 70071406 88544546 85601724 76525505 70245022 70423025 77462751 7205 ...
## $ contbr_employer : Factor w/ 692 levels "","A&E STORES",..: 562 415 415 450 562 562 415 122 592 562 ...
## $ contbr_occupation: Factor w/ 445 levels "","ACADEMIC",..: 23 281 206 118 214 314 206 241 339 314 ...
## $ contb_receipt_amt: num 250 100 2700 2700 2700 2700 2700 2700 50 2700 ...
## $ contb_receipt_dt : Factor w/ 117 levels "01-Apr-15","01-Jun-15",..: 40 98 109 98 107 110 28 95 28 107 ...
## $ receipt_desc : Factor w/ 10 levels "","REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC)",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_cd : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_text : Factor w/ 21 levels "","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ form_tp : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ file_num : int 1015585 1015585 1015585 1015585 1015585 1015585 1015585 1015585 1015585 1015617 ...
## $ tran_id : Factor w/ 2435 levels "A032126DA61AA40D699B",..: 491 1215 436 1167 900 1220 513 836 516 2108 ...
## $ election_tp : Factor w/ 2 levels "G2016","P2016": 2 2 2 2 2 2 1 2 2 2 ...
## $ cand_party : chr "Democrat" "Democrat" "Democrat" "Democrat" ...
## $ Date : Date, format: "2015-04-12" "2015-04-27" ...
Now that it is in Date format, we can plot the data. This plot is the donation amount by time, with each line representing a differnet party (Republican or Democrat).
The first people to get donations was a Republican, and it looks like Democrats did not start to receive donations till about April 2015. Let’s subset this plot to just look at time points after April.
Democrats around mid April to early June were receiving larger donations than Republicans however around mid June this trend seems to fade with them receiving equal donation amounts. There is one spike early for Republicans in late May, but that looks more like an outlier than a true increase. Republicans over this period of time appear to be growing in donation amounts, where the Democrats (while fluxuating) seems to remain mostly the same.
If geom_smooth is used instead, the plot might reflect this general pattern.
This plot loses a lot of information about contributions, but does show that the trend for increase in donations to Republicans, and the steady (maybe a slight decrease) in donation to Democrats over time.
When can also look at this information for each individual candidate separated by party affiliation.
Interesting. The plot is too messy to make too much of it, but it is clear that people came into the race at different time points. Republicans were the first to come forward to declare they were running for President, with Rand declaring very early and then Cruz coming in second. Democrats don’t declare for almost a full 6 months later, with Clinton being the first to recieve donations. Around April time, we see that a lot of Republican candidates start receiving contributions, suggesting they all declared their intentions around this time.
However, the plot is too messy to make any sense of the data, but maybe if we plot the mean value of the donations rather than just the donations the plot might be a bit smoother.
This is a slightly better plot of the data, if I plot the sum instead of the mean does this look a bit better as a plot
This is good, but I think I’m going to subset it to get a better look at the donations starting from April, where a lot of the data is ploting and keep it as the sum instead of the mean.
There are late people in the Republican side that receive high donation amount on average (i.e Bush and Pataki), but they have little data as they only started receieving donations around June (when they must have declared their intentions to run for President), so the amount of time they could have received donations is less than that of Clinton or Rand.
For the Democrats, Clinton is consistently recieveing larger donations than Sanders, with O’Malley varying more than the other two. The Republican data is a little more inconsistent than the Democrat data, and this could in part be because they have a larger number of candidates for donations to be spread out among. But there is one indivdiuals who seems to be on a down trend (receiving lower amounts of donations over time) (Huckabee), while another appears to be increasing the donation amount with time (Graham).
Again using geom_smooth to give a general idea about the overall pattern of the data might be a good idea.
This isn’t as clear as it was for the party affiliation, and this is mostly due to the large number of candidates for the Republican party. The Democrats are mostly stable lines for Clinton and Sanders, while O’Malley has a little decrease and wider variance (standard error) than the other two candidates.
For the Republicans, a lot of individuals have wide SEs. There are trends for people going up and down as hypothesed above. Interestingly, Bush in this graph has a flat line with wide SEs, not appearing to be increasing like the previous plot. This may be due to low number of data points for Bush that are highly variable.
There are a lot of cities in the dataset, so I’m only going to look at the cities that have the most donations (the top 10 cities for frequency in the dataset). Earlier I made a new dataset set with the counts of the number of occurance for each city (called count_CITY). I’m going to use this information to create a new dataframe (“NJ_top_cities”) which only contains the top 10 cities that occur most frequently in the data (have the most contributing indivduals). Then using this new dataframe I will calculate the total amount of donations for each city for each candidate by party affiliation.
## contbr_city cand_nm cand_party sum
## 1 CHERRY HILL Carson, Benjamin S. Republican 650.00
## 2 CHERRY HILL Clinton, Hillary Rodham Democrat 15717.80
## 3 CHERRY HILL Cruz, Rafael Edward 'Ted' Republican 20.00
## 4 CHERRY HILL Fiorina, Carly Republican 500.00
## 5 CHERRY HILL Paul, Rand Republican 250.00
## 6 CHERRY HILL Rubio, Marco Republican 250.00
## 7 CHERRY HILL Sanders, Bernard Democrat 746.88
## 8 CHERRY HILL Santorum, Richard J. Republican 2700.00
## 9 HOBOKEN Bush, Jeb Republican 2700.00
## 10 HOBOKEN Clinton, Hillary Rodham Democrat 25844.64
## 11 HOBOKEN Cruz, Rafael Edward 'Ted' Republican 520.00
## 12 HOBOKEN Graham, Lindsey O. Republican 2700.00
## 13 HOBOKEN Rubio, Marco Republican 700.00
## 14 HOBOKEN Sanders, Bernard Democrat 1510.00
## 15 JERSEY CITY Carson, Benjamin S. Republican 250.00
## 16 JERSEY CITY Clinton, Hillary Rodham Democrat 21721.00
## 17 JERSEY CITY Cruz, Rafael Edward 'Ted' Republican 250.00
## 18 JERSEY CITY Graham, Lindsey O. Republican 4600.00
## 19 JERSEY CITY Paul, Rand Republican 1000.00
## 20 JERSEY CITY Sanders, Bernard Democrat 2060.00
## 21 LIVINGSTON Bush, Jeb Republican 24300.00
## 22 LIVINGSTON Clinton, Hillary Rodham Democrat 11736.75
## 23 LIVINGSTON Cruz, Rafael Edward 'Ted' Republican 222.00
## 24 LIVINGSTON Graham, Lindsey O. Republican 1000.00
## 25 LIVINGSTON Rubio, Marco Republican 2700.00
## 26 LIVINGSTON Sanders, Bernard Democrat 1000.00
## 27 LIVINGSTON Santorum, Richard J. Republican 100.00
## 28 MEDFORD Carson, Benjamin S. Republican 150.00
## 29 MEDFORD Clinton, Hillary Rodham Democrat 3951.80
## 30 MEDFORD Paul, Rand Republican 1001.60
## 31 MEDFORD Rubio, Marco Republican 36.00
## 32 MEDFORD Sanders, Bernard Democrat 855.00
## 33 MONTCLAIR Bush, Jeb Republican 250.00
## 34 MONTCLAIR Clinton, Hillary Rodham Democrat 41881.96
## 35 MONTCLAIR Fiorina, Carly Republican 500.00
## 36 MONTCLAIR O'Malley, Martin Joseph Democrat 1000.00
## 37 MONTCLAIR Paul, Rand Republican 500.00
## 38 MONTCLAIR Sanders, Bernard Democrat 3055.00
## 39 MORRISTOWN Bush, Jeb Republican 5400.00
## 40 MORRISTOWN Clinton, Hillary Rodham Democrat 25233.90
## 41 MORRISTOWN Cruz, Rafael Edward 'Ted' Republican 1425.00
## 42 MORRISTOWN Huckabee, Mike Republican 450.00
## 43 MORRISTOWN Paul, Rand Republican 195.00
## 44 MORRISTOWN Sanders, Bernard Democrat 251.88
## 45 PRINCETON Bush, Jeb Republican 11800.00
## 46 PRINCETON Clinton, Hillary Rodham Democrat 45137.41
## 47 PRINCETON Fiorina, Carly Republican 500.00
## 48 PRINCETON Graham, Lindsey O. Republican 2750.00
## 49 PRINCETON Rubio, Marco Republican 5860.00
## 50 PRINCETON Sanders, Bernard Democrat 2572.00
## 51 RIDGEWOOD Bush, Jeb Republican 10800.00
## 52 RIDGEWOOD Carson, Benjamin S. Republican 1000.00
## 53 RIDGEWOOD Clinton, Hillary Rodham Democrat 15507.25
## 54 RIDGEWOOD Paul, Rand Republican 201.60
## 55 RIDGEWOOD Rubio, Marco Republican 250.00
## 56 RIDGEWOOD Sanders, Bernard Democrat 757.81
## 57 WEST ORANGE Carson, Benjamin S. Republican 250.00
## 58 WEST ORANGE Clinton, Hillary Rodham Democrat 8644.41
## 59 WEST ORANGE Cruz, Rafael Edward 'Ted' Republican 385.00
## 60 WEST ORANGE Fiorina, Carly Republican 266.00
## 61 WEST ORANGE Graham, Lindsey O. Republican 3000.00
## 62 WEST ORANGE Paul, Rand Republican 474.66
## 63 WEST ORANGE Rubio, Marco Republican 776.00
## 64 WEST ORANGE Sanders, Bernard Democrat 402.00
First plot the cities by total donation amount by party affiliation.
Next plot the total donation amounts by city for each individual candidiate.
The top 10 contributing cities are contributing more the the Democrats than to Republicans with most of the donations to Clinton (Dem). However some cities support other candidates with a larger majority (i.e. Livingston and Bush), but it appears the Clinton gets more total donations in these cities than other candidates.
Going to examine the occupation information in the same way I just looked at cities. Again, beucase there are a lot of occupations in the dataset, I’m only going to look at the occupations that have the most donations (the top 10 occupations for frequency in the dataset). Earlier I made a new dataset set with the counts of the number of occurance for each city (called count_OCCUPATION). I can use this to create a new dataframe called “NJ_top_occu” which contains only the top 10 most frequently list occupations. With this new dataframe, I will calculate the total donation amount for each candidate and party affiliation by occupation.
## contbr_occupation cand_nm
## 1 ATTORNEY Bush, Jeb
## 2 ATTORNEY Carson, Benjamin S.
## 3 ATTORNEY Clinton, Hillary Rodham
## 4 ATTORNEY Cruz, Rafael Edward 'Ted'
## 5 ATTORNEY Fiorina, Carly
## 6 ATTORNEY Graham, Lindsey O.
## 7 ATTORNEY Paul, Rand
## 8 ATTORNEY Rubio, Marco
## 9 ATTORNEY Sanders, Bernard
## 10 ATTORNEY Santorum, Richard J.
## 11 CEO Bush, Jeb
## 12 CEO Carson, Benjamin S.
## 13 CEO Clinton, Hillary Rodham
## 14 CEO Cruz, Rafael Edward 'Ted'
## 15 CEO Huckabee, Mike
## 16 CEO O'Malley, Martin Joseph
## 17 CEO Paul, Rand
## 18 CEO Rubio, Marco
## 19 CEO Sanders, Bernard
## 20 CONSULTANT Bush, Jeb
## 21 CONSULTANT Clinton, Hillary Rodham
## 22 CONSULTANT Paul, Rand
## 23 CONSULTANT Sanders, Bernard
## 24 HOMEMAKER Bush, Jeb
## 25 HOMEMAKER Carson, Benjamin S.
## 26 HOMEMAKER Clinton, Hillary Rodham
## 27 HOMEMAKER Cruz, Rafael Edward 'Ted'
## 28 HOMEMAKER Graham, Lindsey O.
## 29 HOMEMAKER Huckabee, Mike
## 30 HOMEMAKER O'Malley, Martin Joseph
## 31 HOMEMAKER Pataki, George E.
## 32 HOMEMAKER Rubio, Marco
## 33 INFORMATION REQUESTED Clinton, Hillary Rodham
## 34 INFORMATION REQUESTED Paul, Rand
## 35 INFORMATION REQUESTED Sanders, Bernard
## 36 INFORMATION REQUESTED PER BEST EFFORTS Bush, Jeb
## 37 INFORMATION REQUESTED PER BEST EFFORTS Carson, Benjamin S.
## 38 INFORMATION REQUESTED PER BEST EFFORTS Cruz, Rafael Edward 'Ted'
## 39 INFORMATION REQUESTED PER BEST EFFORTS Fiorina, Carly
## 40 INFORMATION REQUESTED PER BEST EFFORTS Rubio, Marco
## 41 INFORMATION REQUESTED PER BEST EFFORTS Santorum, Richard J.
## 42 LAWYER Clinton, Hillary Rodham
## 43 LAWYER Rubio, Marco
## 44 LAWYER Sanders, Bernard
## 45 NOT EMPLOYED Clinton, Hillary Rodham
## 46 NOT EMPLOYED Sanders, Bernard
## 47 PHYSICIAN Bush, Jeb
## 48 PHYSICIAN Carson, Benjamin S.
## 49 PHYSICIAN Clinton, Hillary Rodham
## 50 PHYSICIAN Cruz, Rafael Edward 'Ted'
## 51 PHYSICIAN Pataki, George E.
## 52 PHYSICIAN Paul, Rand
## 53 PHYSICIAN Rubio, Marco
## 54 PHYSICIAN Sanders, Bernard
## 55 RETIRED Bush, Jeb
## 56 RETIRED Carson, Benjamin S.
## 57 RETIRED Clinton, Hillary Rodham
## 58 RETIRED Cruz, Rafael Edward 'Ted'
## 59 RETIRED Fiorina, Carly
## 60 RETIRED Huckabee, Mike
## 61 RETIRED Pataki, George E.
## 62 RETIRED Paul, Rand
## 63 RETIRED Rubio, Marco
## 64 RETIRED Sanders, Bernard
## 65 RETIRED Santorum, Richard J.
## cand_party sum
## 1 Republican 17550.00
## 2 Republican 50.00
## 3 Democrat 90253.68
## 4 Republican 975.00
## 5 Republican 266.00
## 6 Republican 7600.00
## 7 Republican 474.66
## 8 Republican 2026.00
## 9 Democrat 1946.88
## 10 Republican 2500.00
## 11 Republican 5400.00
## 12 Republican 1495.00
## 13 Democrat 32935.00
## 14 Republican 250.00
## 15 Republican 500.00
## 16 Democrat 3700.00
## 17 Republican 495.00
## 18 Republican 3100.00
## 19 Democrat 250.00
## 20 Republican 8100.00
## 21 Democrat 35538.05
## 22 Republican 3101.60
## 23 Democrat 500.00
## 24 Republican 28650.00
## 25 Republican 2300.00
## 26 Democrat 61752.55
## 27 Republican 350.00
## 28 Republican 1850.00
## 29 Republican 2700.00
## 30 Democrat 2700.00
## 31 Republican 2700.00
## 32 Republican 863.00
## 33 Democrat 26120.00
## 34 Republican 9070.00
## 35 Democrat 2070.00
## 36 Republican 8100.00
## 37 Republican 5275.00
## 38 Republican 2485.00
## 39 Republican 2700.00
## 40 Republican 13520.00
## 41 Republican 2700.00
## 42 Democrat 32918.41
## 43 Republican 200.00
## 44 Democrat 402.00
## 45 Democrat 4483.00
## 46 Democrat 14977.05
## 47 Republican 2700.00
## 48 Republican 600.00
## 49 Democrat 20638.00
## 50 Republican 5950.00
## 51 Republican 16200.00
## 52 Republican 2153.20
## 53 Republican 500.00
## 54 Democrat 2000.00
## 55 Republican 36350.00
## 56 Republican 20235.00
## 57 Democrat 83794.48
## 58 Republican 7627.00
## 59 Republican 750.00
## 60 Republican 450.00
## 61 Republican 1000.00
## 62 Republican 8520.66
## 63 Republican 8711.00
## 64 Democrat 2330.33
## 65 Republican 1050.00
First plot the total donation amount by occupaton and party affiliation.
Next plot the total donation amount by occupation and individual candidate.
Again, because Democrats have more donations overall, the plots show that Democrats are getting more donations from the top 10 occupations, but there are some interesting patterns. Lawyers and Not Employed include only donations to Democrats and none to Republicans, whereas those that did not disclose what they did (Information requested per best efforts) only donated to Republicans and not Democrats. Also for Retired, Physicians and Homemarkers, the split appears to be much closer to 50/50 than other occupations. Also, the candidate receiveing the most money for almost all occupations is Clinton (Dem), except for those that did not disclose what they did (Information requested per best efforts) wher Rubio (Rep) received the largest amount, and for not employed indivdiuals who gave more money to Sanders (Dem) than Clinton (Dem).
The first plot is shows the donation amounts received over time to each party (Republican or Democrat). This plot shows the differences in donations made, particularly to the Republican party as it appears to be gaining in donation amount over time. It would be interesting to re-assess this data in 6 to 12 months time to see if that trend continues.
Furthermore, the plot which shows the donations amounts to indivdiaul candidates by party affliation is also interesting. This shows that Clinton recieves a consistent amount of donations over time which is higher than Sanders, but really the Repubican side of this plot is more interesting than the Democrats. Clinton has received the most money over time even in comparison with the Republicans who all recieve smaller amounts over time than Clinton does. The Republicans have more candidates, and this plot shows when some declared their intentions to run for President (as this is approximately near the time they start receiving donations). Bush for the Republicans is one of the last to start recieving donations and his line is increasing at teh very end of the plot, but on average they are large, and this might in part explain the upward trend seen in the previous plot.
The last plot choosen is the bar plot of total donations made from the top ten contributing cities by party affiliation. While most show that donations are largely to Democratic candidates, Livingston is intersting as they have donated more to Republicans than Democrats. It also reflects differences between cities in New Jersey and their political leanings. While most (9 out of 10) appear to most heavily support (via monetary donations) a Democratic candidate, one town favors Republicans, showing that location and potential through that socio-economic status or other factors linked to location, play a role in political support.
This is the first time that I have worked with this type of data, and I found it very interesting.
There were some anomalies in the data, particularly with regards to donations amounts. There were some in the data that are over the legal limit ($2700), and these are obviously in violation of the law. Having done an internet search on this, it appears these are normally refunded (as some donations in this dataset were), or part of the amount transfered to be a donation in a spouses name (as was also the case for some of the data in this dataset). For this reason, I choose to only focus on donations that I felt were valid, meaning falling within the values of $1 and $2700.
It was difficult to get to know the data, and all the various columns of information. The date information needed to be reformatted to work correctly. Other information was lacking, like the employer and occupation information was only there for a subset of individuals. Cleaning needed to be done for candidate names, which is a little worrying. This needed to be updated to give accurate reflections of the donations.
I also gave the individual candidates party affiliation, to assess differences between the two main US parties in how individuals form NJ were donating. Not overly suprisingly, Democratic candidates were more successively at raising money in NJ than Republican counterparts. I think this is because NJ as a state in recent elections (the past 6) votes for Democratic candidates, suggesting Democratic leanings to the NJ populations.
Most people were donating to Clinton above all other candidates, but the time series plots showed that there were some increases on the Republican side which might be due to individuals making contributions to Bush who entered the race later than most other individauls. Re-assessment of this data in 6 months time may show slightly different patterns if people continue to donate to Bush to the same extent as they had started here.
Overall I found this incredibly interesting, and will most likely download the data in a few months time to see how it has changed.